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Abstract 

The key findings of classical population genetics are derived using a framework 
based on information theory using the entropies of the allele frequency distribu- 
tion as a basis. The common results for drift, mutation, selection, and gene flow 
will be rewritten both in terms of information theoretic measurements and used 
to draw the classic conclusions for balance conditions and common features of 
one locus dynamics. Linkage disequilibrium will also be discussed including the 
relationship between / and r^. 
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1. Introduction 

Population genetics and information theory both began to emerge in the first 
half of the 20th century. Population genetics, animated by the ongoing debate 
about the relationship between the theory of evolution, driven by natural selec- 
tion, and the laws of Mendelian inheritance, became one of the foundations of 
modern biology and enabled biologists to show how the frequency of inherited 
alleles as well as genotype frequencies in a population can be affected by the 
various processes such as mutation, selection, andgenetic drift P, [2, S B IB ■ 
With the rise of the neutral theory of evolution [a, 0| and the genomics rev- 
olution, it has helped supplement the insights gained from genetic data and 
been used to explain phenomena such as the ratio and rates of synonymous and 
non-synonymous substitutions and how this can be used both as a molecular 
clock between species or to identify positively or negatively selected genes Q, 
coalescent theory which addresses the distance between populations separated 
by time but linked due to a recent common ancestor [ij , and gene flow amongst 
genetically modified and wild organisms 10|. 

Information theory, though born a couple of decades after most of the initial 
insights of population genetics, has had an impact just as far reaching and im- 
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port ant. Developed by the legendary Claude Shannon in the 1940s at Bell Labs 
information theory enabled the communications revolution, the Internet, 
and revolutionized views of entropy and information allowing information and 
information transfer rates to be successfully quantified. In a tribute to its utility 
and expansive scope, information theory was later adopted by other disciplines 



to understand related or completely unrelated phenomena [12|. Some of the 
best known examples are the papers by Edwin Jaynes showing that much equi- 
librium statistical mechanics can be derived using only information theory and 



assumptions of maximum entropy [13|, [15| . This has led to the rise of the 



Maximum Entropy (MaxEnt) school of inquiry in statistics and the expanded 
use of information theory across a wide variety of the natural sciences. Inter- 
estingly enough, in his Ph.D thesis written at Cold Spring Harbor Laboratory, 
Shannon tackled the ideas of population genetics 16| . A concise and fascinating 
summary of his work is given by James Crow [l7 . 

In this paper, it will be shown that there are deep links between quanti- 
tative population genetics and information theory. This will not be an ab- 
stract treatment with only a passing reference to biologically meaningful and 
important quantities. It also is not an attempt to claim that the underlying 
mechanisms of evolution are based on "information" , vaguely defined, instead 
of well-understood and recognized biological forces. Rather, this will show that 
the key valuable results of population genetics can be understood by seeing 
that the evolution of allele frequencies in a population can be interpreted as 
a biological process whose mechanisms have exactly corresponding information 
theoretic measures and that the techniques of information theory can shine new 
light on what these biological processes mean in aggregate as well as simplifying 
theoretical analysis of some evolutionary processes. 



2. Preliminary Concepts 

There have been prior investigations of population genetics borrowing tools 
from information theory. First, for years some of the most popular metrics 
for measuring biological diversity have been borrowed from information theory 
[l8,.19, 20]. It is difficult to say exactly when the subject of a role for entropy in 
population genetics was first tentatively raised. Moran did investigate the en- 
tropy of general Markov processes (2l| in a now almost forgotten paper written 
after his famous papers on birth-death population genetics models. Watterson 
approached the subject again a year later in his 1962 paper Q on diffusion 
theories in population genetics. Towards the end of the paper, he calculated the 
entropy of the allele frequency distribution as a possible measure of the time for 
a population to completely lose one allele, decaying towards homozygosity. As 
will be shown later in this paper, he was very correct as entropy does directly 
determine the decay time of a population's heterozygosity by genetic drift. Also, 
other works seek to explain or derive aspects of population genetics using tech- 
niques involving Fisher information [23| or general methods of computation for 
nonlinear dynamical systems [2^ . 
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This treatment will be different, however, in showing that all evolutionary 
forces can be consistently represented by information theoretic measurements 
in a comprehensive theory. In this paper, the focus will be on the two allele 
single locus model. Allele frequencies, p and q, will be represented as objective 
probabilities which will reflect the presence of the alleles amongst all loci in 
the population. The connections between population genetics and information 
theory will be made using several information-based quantities which will be 
defined in this section. 

First, and most famous in information theory, is the concept of Shannon 
entropy. For a random variable distribution with n different states with proba- 
bilities P{i) where J2"=i ^(*) — 1j ^^"^ entropy, S, of the distribution is defined 

by 

n 

S=-Y.P{i)\ogP{{) (1) 

The value of S always ranges between a minimum of for the trivial distri- 
bution where an event occurs with a probability 1 and a maximum of 5 = log n 
for the uniform distribution across all n states. This simple definition of en- 
tropy belies the fact that entropy has several orders depending on the degrees 
of freedom defined in the distribution. The lowest order, zcroth order entropy, 
or S'o, is simply represented by 

Sa = \ogn (2) 

and depends on only the number of possible states in the distribution. For 
the two allele model, S'o — log 2. The first order entropy, which will be referred 
to as S without a subscript, is the traditional definition given in equation [T] and 
its maximum value is the value of S'o. The second order entropy, S2, the last 
relevant order for two allele models, is given by 

n n 

52 = -5]^P(z,j)logP(i,j) (3) 

The quantity S2 is often described as the joint entropy and is based on the 
joint probability of states i and j. In the paper, it will also be referred to as S 
over two variables, e.g. S{x,y). Similar to the relationship between So and S, 
S2 < Sii) + SU) = 2S. 

In addition to the measures of entropy, there will be two other useful quan- 
tities, the KuUback-Leibler divergence, D, and mutual information, /. The 
KuUback-Leibler divergence is a quantity which measures the difference between 
two probability distributions. For two distributions, / and g, the D from / to 
g is defined by 

m5)=E/Wlog^ (4) 
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Measure 


Equation 


Significance 


So 


log n 


Maximum entropy of an n allele 
model 


Si 


-^;L,p(i)iogPW 


Entropy of allele frequencies; key 
measure of change in allele frequen- 
cies over time 


S2 




Entropy of loci based on allele pair 
frequencies; key measure of changes 
in genotype 


KuUback-Leibler Divergence 


Eii.m^osi^ 


Used to model genetic drift 


Mutual Information 


E,=i E,=i^'(*.i) log PWPU) 


Used to model selection and non- 
random mating 



Table 1: Key measures of information theory and their significance to the evolution of allele 
frequencies and genotypes in a population. 



where D > 0. One important aspect of the divergence to note is it is not a 
distance metric since D is not symmetric with respect to the distances between 
the distributions and D{f,g) 7^ D{g,f). Another way to express D is 

D{f,g)^Sxif,9)~S{f) (5) 

where Sx is the cross entropy represented by Sx — ~ /(*) logff(0 
The KuUback-Leibler divergence will be integral to our discussion of genetic 
drift. 

The mutual information, I, between two random variables, i and j is a 
representation of the entropy from one variable that can be derived given the 
entropy of another in a distribution. Shannon first used it to measure the 
capacity of a channel by seeing how much the output of message at the receiver 
could be determined by the input. Formally, / is given by 

/^yyp(z,j)iog (6) 

The mutual information also has an alternate formulation 

/ = S{{) + S{j) - S{i,j) = S{i) - Sm = S{j) - S{j\i) (7) 

Mutual information will be used to represent the effects of selection and non- 
random mating in populations. These quantities will be shown for reference in 
table [1] along with their relative significance. 

One assumption which will be effectively used throughout the paper is the 
assumption that the allele entropy is extensive. Therefore, we are able to add 
the cumulative entropy changing effects to come up with the net change in allele 
entropy each generation. 
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3. The basic model and Hardy- Weinberg Equilibrium 



The basic model for the evolution of allele frequencies over time will be 
given by measuring the first order entropy change between generations. Given 
discrete generations, we can utilize the techniques of difference equations pHj . 
This entropy for alleles p and q at the current time is St and the change is 
represented through a difference equation relation 

St = St-i+l^S (8) 

or 



^S = St- St^i (9) 

The change in entropy A5 is caused by the accumulated effects of evolu- 
tionary forces acting on the population. Hardy- Weinberg equilibrium (HWE) 
is the basic steady state assumption of the of genotype frequencies amongst 
populations not undergoing any sort of evolutionary selection or non-random 
mating to force genotype proportions to differ from those expected from ran- 
dom mating. Given that HWE is a statement of the frequencies of genotypes 
given allele frequencies, it is obvious that both the first order (allele) and second 
order (genotype) entropies will need to be used. 

In the most trivial case, A5' = 0. In the presence of random mating (zero 
mutual information), this equality dictates a condition of maximum equilibrium. 
The distribution of genotypes given by S'(p, q) at maximum equilibrium was 
first expounded in a paper by Wang, Yuan, and Guo et. al. [2^ in which they 
use Lagrange multiplier techniques to show that the distribution at maximum 
equilibrium given allele frequencies p and q is the Hardy- Weinberg equilibrium 
distribution for genotypes: (p, p) = p^; {p, q) = 2pq; (q, q) = q^. This was further 
developed by Zhang & Zhang [27| who expand the analysis to limited cases of 
multiple alleles. Unfortunately, both papers are only available in simplified 
Chinese at this time but the mathematical portion of the first is shown in 
Appendix C. Their result can also be seen from the corollary of zero mutual 
information (/ = 0) at maximum entropy. It is easy to see then that the term 
in the logarithm for equation [S] must equal 1 for all terms where 



P{p,p) =_p^ 

P{p,q) =pq 
p{q,p) ^pq 



(10) 



4. Genetic Drift and Kullback-Leibler Divergence 

The first evolutionary force we will model from an information theoretic 
perspective is genetic drift. Genetic drift is the tendency for allele frequencies 
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to be affected, to the point of fixation for one allele, due to the statistical effect 
of sampling errors amongst the survival and reproduction in populations which 
leads to deviations from Hardy- Weinberg equilibrium and its assumption of 
stable allele frequencies. In effect, genetic drift is caused by deviations in the 
subsequent generation's allele frequency caused by stochastic processes. Genetic 
drift is very sensitive to the size of the population and usually only has significant 
effects on the order of 2Ne generations where is the effective population size. 

Genetic drift, while being a completely stochastic effect, in a two allele model 
has the eventual result over a long time span of fixing one allele and eliminating 
the other. The fixed allele is completely random though the probability of 
fixation is equal to the frequency of the allele. This random drift, contrary to 
most diffusion in physical processes which increase entropy, reduces the overall 
entropy of the allele frequency distribution until a steady state is reached where 
S = Q. 

The theory of large deviations [2^ [2^ is a branch of probability theory 
which describes the probability of deviation of an empirical distribution from its 
expected theoretical distribution. In the theory of large deviations, the entropy 
function of the size of a deviation from an expected distribution can usually 
be represented by the KuUback-Leibler divergence. The divergence can also be 
connected to a probability of deviation P from the mean value by a formula due 
to Cramer: 

lim InP = D (11) 
where N is the number of trials or particles in the system. In the paper 



referenced earlier by Watterson [22], he determines that the average time for a 
population with allele frequency p to decay to homozygosity is roughly equal 
to the entropy. Here we will approximate using only the entropy and excluding 
the —AN/Ne term to correct for the ratio of effective to actual population in 
the population and assume the real and effective populations are equivalent and 
there is no mutation. Therefore the continuous exponential decay of probability 
can be represented as 

p = Pae-i (12) 

Normalizing in terms of generations, the probability for the population to 
decay to homozygosity in one generation is 

P - e-^^ (13) 

Remarkably, if considering Sq = and rewriting in terms of AS*, this ex- 
pression is the Einstein fiuctuation formula, with the ideal constant R instead 
set as 1. In addition, assuming the limit approximation is valid and for N in 
equation [TT] being N — 2Ne we can write the divergence as 
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Finally, solving for P in equation [TT] and setting it equal to equation [T3] as 
the Einstein fluctuation formula we have e~^^ — e^^'^^ so AiS* = D. Since D 
is always positive and the net effect of genetic drift is to reduce entropy, AS* is 
changed by subtracting D. The final expression for the entropy change by the 
divergence is 

A5 = -^5 (15) 

Given that q = 1 — p we can represent S in one variable. One of the key 
discoveries of this paper is that key approximations from population genetics 
can be derived when a linear approximation of S is taken. Using the famous 
Mercator approximation of log x around 1, we can make the approximation that 
\ogx « X — 1. Therefore, the entropy can be shown as below 

S = -plogp - (1 - p) log(l - p) « 2p{l -p) = 2pq (16) 

This shows that in the linear approximation, entropy is approximately the 
same magnitude as the heterozygosity frequency, h, in the population. Why 
is this important? Classical population genetics used various scale and linear 
approximations to deal with the balance equations since advanced nonlinear 
analysis techniques were then not available. The derivations shown below will 
take advantage of this showing that these same approximations can be shown 
to be a linear, limiting case of a more general treatment based on entropy. 

Equation [15] thus becomes 

A/.^-^/. (17) 

The form in both equations [TS] and [T7] is the difference equation form for 
compound growth and the solution for ht works out to be 

ht = ho(l-,;^y (18) 



2N, 



with a continuous time expression 



ht = hoe^ (19) 

Both of these expressions give a half-life of heterozygosity at ti/2 — 2Nf, In 2. 
All of these results completely agree with the calculated decay of heterozygosity 
and genetic diversity that drift causes. As the heterozygosity approximation 
will often successfully be used in the paper, a few caveats are needed. The 
entropy and heterozygosity are best used to look at similar behavior during 
the evolution of the population. This can lead to valid theoretical insight, 
however, entropy should not be used as a numerical proxy for the exact value 
of heterozygosity as these can differ in value while showing the same overall 
behavior. Also, the entropy as heterozygosity approximation is only valid when 
there is random mating and only the effects of drift, mutation, or migration are 
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impacting. As will be explained later, if the mutual information between alleles 
is positive, indicating selection or non-random mating, the second order entropy 
representing genotype frequencies will deviate from the value of 2S and mean 
that the heterozygosity will not necessarily be represented by 2pq. 

Drift amongst more than two alleles 

As will consistently be shown, one of the powers of the entropy method is 
that it allows you to generalize results seamlessly under multiple conditions. 
One case here is the presence of more than two alleles at a locus. For example, 
let's look at the 3 allele model where one locus can have alleles p, q, and r where 
p + q + r = \. Using the log approximation for S we have the following result 

S^p{l-p)+q{l-q) + r{l-r) (20) 
When we see that 1 — p = q -\- r and the corollaries for q and r then 

S pq + pr + qp + qr + rp + rq = 2pq + 2pr + 2qr (21) 

This is equal to the total ratio of heterozygosity amongst all combinations 
for the three alleles. Therefore, in the 3 (or n) allele model, drift reduces the 
total ratio of all heterozygous combinations similar to heterozygosity in the 
two allele combination. This exactly matches the same conclusions reached by 
Kimura in his analysis of drift in a multi-allelic locus [s^ where he showed total 
heterozygosity always decreases at a rate 1 /2Ne per generation for any number 
of alleles. The magnitude of each heterozygous combination depends on the 
allele frequencies of its constituting alleles. 

4-2. The Diffusion Approximation 

Finally, I will show that from equation [15] you can derive the diffusion ap- 
proximation first derived by Fisher ISll [s^ , expounded on by Wright [33| , and 



widely popularized by Kimura [3J, |35{ . The full derivation of the below will be 
shown in Appendix A 



5. Entropy Increases by Mutation 

The next process we will study is mutation where the mutation rate per site 
per generation is represented by /i. A general study of the overall nucleotide 
base entropy of infinite and finite length DNA sequences with single nucleotide 
polymorphisms (SNPs) was performed by Ma et. al. [36]. In this paper we will 
instead look at the mutation rate for alleles and consider the overall entropy 
introduced to the allele frequency distribution. Mutation introduces genetic 
diversity and thus is an entropy increasing process. The entropy introduced by 
mutation is relatively straightforward 



8 



S"™ = ~/ilog^ - (1 - ^)log(l - ^) (23) 

Given the usual low magnitude of /i on the scale of 10^^ - 10^*, this overall 
effect is small. Taken in isolation, AS' — Sm meaning that every generation 
there is a constant incremental entropy change linked to a probability x — n 
which corroborates the conclusion that the mutation rate is also the probability 
of fixation for a mutation in a population. 

It is also possible to derive the expected results from the drift-mutation 
balance and selection-mutation balance. Here we will treat the drift-mutation 
balance. The overall change in entropy is represented by 

AS = ~^^S + Sm (24) 

At balance, the entropy of the allele frequency distribution remains constant 
though the individual alleles are in a dynamic equilibrium. Thus, AS" = and 

2^^-^™ (25) 

Again, we can simplify Sm in a similar manner to that in equation [161 where 
Sm ~ 2/x(l — /i). Given that /i <C 1 this can be further simplified to Sm ~ '^fJ-- 
Substituting 2/x for Sm in equation [25] and again approximating S as h we 
derive the steady state heterozygosity at drift-mutation balance for the infinite 
site model 

h = m^^i (26) 

We will return to a discussion of selection-mutation balance in the next 
section on mutual information. 



6. Mutual Information: Modeling Selection and Non-Random Mat- 
ing 

The aforementioned evolutionary effects, despite mutation and drift, assume 
that random mating and thus the frequency of alleles is the main variable in 
determining genotype frequencies. Here we will deal with the violation of this 
assumption, normally caused by selection and inbreeding, which leads to differ- 
ential survival and reproduction rates amongst genotypes. 

Both of these effects, usually given separate treatments, are unified in that 
they are both causes of increased mutual information between alleles. In other 
words, the genotype frequencies will not refiect purely random combinations 
and the allele and genotype frequency will change across generations owing to 
this. 

As shown in equations [6| and [3 the genotype takes center stage in mutual in- 
formation. Here we demonstrate the effect of mutual information on both allele 
frequencies and genotypes. The important quantity is the mutual information 
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between alleles p and q between two generations due to effects of selection or 
non-random mating. Following this, 



I{t,t-l)^St + St-i-St{p,q) (27) 
to find the allele entropy change, we manipulate this equation with 2St^i 

I{t,t~ 1) - 2St-i ^St- St-i - St{p,q) (28) 

and 

^S = St^ St-i = I{t, t-l) + St{p, q) - 2St-i (29) 

Unlike the KuUback-Leibler divergence, mutual information increases the 
entropy, however, combined with the — 2iS't_i the overall effect of selection is 
usually negative reducing the overall diversity in the population except in cases 
where a relatively rare allele has a selective advantage. The change in entropy is 
also related to the value of the joint entropy between p and q. When St{p, q) is 
represented by HW proportions and no other evolutionary forces are acting, the 
right hand side reduces to and the population has a constant allele frequency. 
If the value 2S'(„i — St{p^ q) is considered as a type of quasi-mutual information, 
/', between the allele frequencies in t — 1 and the genotype frequencies in t then 
equation can be restated as 

AS = St- S't-i = / - /' (30) 

This combination is not easily analytically tractable under most circum- 
stances. /' is not a formally defined quantity and differs from mutual informa- 
tion in many aspects, one of which is that it can be negative. It is mainly used 
for conceptual and notational convenience. However, in most cases, St-i > St 
since there is an overall decrease in entropy between generations as the overall 
distribution of allele frequencies is changed by selection. Therefore, often times 
/' > / so that 

AS « (31) 

/' can typically be reduced by to the following form where p' = p^^^+pq^^^ 
and q' ^q^^ +pq^ 



I' = ~2{p~p')\ogp-2{q-q')logq 

2™11, ■'"11 

+ P^^^log^^ 

w w 

, „ Wl2 , Wi2 

+ 2pq— log — 

w w 



2W22 , W22 

r— log — 

w w 



(32) 
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or 



/' = -2ip~p')\0gp-2{q-q')\0gq 

+ P log 

w w 

+ 2pq{l-hs)—\og{l~hs) — 
w w 

2/1 -.Wii Wii 

w w 



(33) 



We can now solve for two cases of mutation-selection balance where < s < 1 
and h < 1/2 [37] and the equilibrium is asymptotically stable. Equation 1331 can 
be used to interpret the balance conditions for selection and mutation by setting 
I' = 2/Lt. For example, take the case where the allele q is selected against with 
a strength measured by s and p w 1 as well as wn u" w 1. In addition, given 
q is the rare allele, q — q' ^ ii. First where h = 

~2^i\ogq + qHl- s) log(l - s) = 2fi (34) 

and 

g2(l - s) log(l - s) = 2m(1 + logg) (35) 
using the log approximation for 1 — s and approximating 1 2> s 

q^ = _Mi±i^ (36) 

s 

given that q is likely very small, log g is a correspondingly large negative 
value. If I loggj > 1 but the differences in orders of magnitude between fi and s 
are large we can come up with the familiar approximation of 

! (37) 

Similar procedures apply when h > and the heterozygous frequency is the 
dominant presence of the recessive allele 

2q{l- hs) log(l ~hs) ^2fi{l + log (?) (38) 
and assuming 1 ^ hs and with similar arguments regarding log q 

(39) 

In conclusion, just like drift, the main conclusions of population genetics can 
be readily derived using the information theoretic description. 
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7. Entropy Changes due to Gene Flow 

The final change in allele frequency we will deal with is the change in entropy 
by the migration of a population to or from the one under analysis. For example 
here, we will base our results on the simple Wright mainland-island model. To 
determine the total entropy change in the recipient (island) population, we 
must calculate the combined entropy of both the resident identical by descent 
population and the immigrants. For the island population with allele frequencies 
p and q which has a percentage of its population, to, as migrants with allele 
frequencies for the same alleles of p* and q* we can calculate the combined 
entropy as below, 

Stotal — Sjjiigrants ~^ Sisland ('^^) 



Stotal — 

p*m2Ne , p*m2Ne 

— log 

q*m2Ne , q*m2Ne 

— log 

2iVe ^ 2N^ 

p{l - m)2Ne ^ p{l - TO)2jVe 
2Ne 2Ne 

q{l - m)2Ne . g(l - m)2A^e 

2Ne 2iVe 

(41) 



— p*TOlogp*TO — g*TOlogq*m— p(l — to) \ogp{l — m) — q(l ~ m,) log (/(I — to) (42) 
The logarithms can then be expanded to produce 



p*m\ogp* — p*TOlogTO — q*mlogq* — q*m\og'm 

p{l ~ to) logp — p{l — m) log(l — to) ~ q{l ~ to) log q — q{l — m) log(l — m) 

(43) 



— m{p* \ogp* + q* \ogq*) — mlogm(p* + q*) 

— (1 — TO)(plogp + qlogq) — (1 — to) log(l — m)(p + q) 

(44) 

Given that both p* +q* = 1 and p + q = 1 and defining 5* = —p logp — q log q 
and S* = —p* logp* — q* \ogq* we finally reduce equation l44l to 
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St^t = 5 + m{S* - S*) - m log m - (1 - m) log(l - to) (45) 



Here we see a fortuitous derivation. With m being the ratio of the population 
from migrants, we see that we have derived the basic formulation for Gibbs' 
entropy of mixing in 

S'rmrr = -JTllogTO - (1 - m) log(l - to) (46) 

so our final expression for the total entropy change due to gene flow is 

Stot ^ S + m{S* - S) + Smix (47) 

= m{S* -S) + 5„„ (48) 

Note the entropy change varies with m(S* ^ S) which is an exact analogue of 
the change in probability p during gene flow where Ap = m{p* —p). In the case 
where the immigrating populations have the same allele frequency distribution 
as the island population S* — S and we reduce to 

AS = Srmx (49) 

This raises a paradox similar to the one Gibbs confronted about 140 years 
ago in the theory of statistical mechanics. In this case, the amalgamation of 
two distinct populations can be modeled in a similar manner to the entropy 
change of mixing in thermodynamic processes where the entropy of mixing and 
the weighted average entropy of the two populations combine to determine the 
entropy of the new combined population. 

However, this raises a paradox. On one hand we have the classical result 
Ap — m{p* — p) so for populations with identical allele frequencies Ap = 0. 
On the other hand, the increase in entropy driven by the entropy of mixing 
directly predicts a change in the overall entropy which necessitates a change 
in the allele frequencies. The solution, when the proportion of immigration 
equals that of death and emigration, is that the entropy of mixing is offset by 
the entropy of "de-mixing" when the proportion to of the population from the 
previous generation dies or migrates away as the model implies. If you replace 
p* and q* in equation 1441 with p and q we see that 

Stotal = S + Smix (50) 

SO the proportion of the population leaving would cause a "de-mixing" of ex- 
actly the same magnitude. Therefore, when two populations with identical allele 
frequencies mix and the population size stays constant, AS" = as expected. 
For two populations with differing allele frequencies 

AS" = m{S* - S) (51) 
Next, we will look at drift-migration balance. 



13 



(52) 



and 



^~ l + 2Nm^ 

2Nm , ^ 

h = -——h 

1 + 2Nm 

(53) 

which imphes a fixation index, Fgt of 

This shows that under the simple model we find that the observed heterozy- 
gosity is equal to the heterozygosity of the migrating population times i'^2Nrn 
which can be an approximation of 1 — Fg^. If h* is the original heterozygosity 
of the island population h, we can see the standard balance for drift and mi- 
gration for small m becomes the observed heterozygosity equaling the expected 
heterozygosity times 1 — Fst- 



8. Master Balance Equation 

From the foregoing discussions and given that entropy is an extensive quan- 
tity whose total amount is additive, we can begin to look at the entire evolution 
of a population's allele and genotype frequencies in better detail. Specifically 
the full equation for changes in the allele frequency is given by 

AS=-D + {I- I') + S^ + Sf (55) 

where Sf represents the change in entropy due to gene flow. As will be shown 
in the simulation results in the next section, the change in the allele frequencies 
in a population due to all evolutionary forces can be simulated using entropy and 
matched with simulated results. In addition, we can obtain a master balance 
equation subject to all forces when AS = by showing the following 

D + I' - I = Sm + Sf (56) 
expanding assuming that I' ^ I and Sm = 2fi 

^St-i + 2St-i - St{p, q) = 2n + m{S* - St-i) (57) 
St-i (^2 + ^ + = 2m + St{p, q) + mS* (58) 
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If in the case of no or balanced gene flow {S* = S) we can reduce to 



St_,(^2 + ^J =2^^ + St{p,q) (59) 

and finally substituting h for St-i in equation [55] when there is no mutual 
information 



1 

2iv: 



= 2n + mS* (60) 



The terms on the left side largely represent population size (extensive) effects 
while those on the right side represent intensive effects on allele frequencies due 
to mutation and selection. This gives a general equation for measuring either 
the change in heterozygosity or the change in other evolutionary parameters 
over a timescale where allele frequencies are relatively stable. 

8.1. Changes in Entropy over Multiple Generations 

Obviously, there often may be a situation calling for the analysis of the 
change in entropy across multiple generations. Given the preceding equations, 
this can be done iteratively using computer simulation (as will be demonstrated 
in the next section) or in some limited cases, analytically. In particular, one 
can look at the entropy several generations into the future or past if certain 
assumptions are made regarding the stationarity of certain parameters. 

The easiest assumptions to deal with are genetic drift and mutation. By 
analyzing the master equation iteratively only involving drift and mutation, 
one can calculate the entropy St given the entropy d generations in the past 
St^d with the following equation 



' 1=0 ^ 

For a large number of generations the first term goes towards zero and the 
second term geometric series converges to 2Ne giving 

St = 4iVeM (62) 

Selection can be similarly integrated into the analysis, however, given the 
sometimes volatile nature of selection and how it integrates competition, mutual- 
ism, environment, and disease among other variables, it is questionable whether 
a steady state relative selection coefficient bears much semblance to reality. 

8.2. Boundary Conditions 

A final key feature that we must understand is the behavior of the entropy 
evolution equations at the two boundaries of minimum and maximum entropy, 
and logn respectively. At 5 = 0, one allele becomes fixed and the other is 
lost and therefore both genetic drift and mutual information disappear. Given 
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that additional mutation and gene flow only aet to increase entropy, then the 
behavior of the equations is completely consistent at the boundary &t S = 0. 
For maximum entropy, by definition A5 = and therefore, even though there 
will likely be mutation and migration which push to increase entropy, there must 
be counterbalancing effects. Maximum entropy conditions can only arise in the 
absence of any selective pressure or nonrandom mating on allele frequencies or 
genotypes and mutation and gene flow must balance with genetic drift. However, 
one could easily imagine a hypothetical situation of a large population where 
genetic drift is negligible over appreciable time scales but entropy increases due 
to mutation and migration push the entropy to its maximum. In the next 
generation, it would seem the master evolution equation would dictate that 
the entropy must increase above log n in the subsequent generation. However, 
one must understand when allele frequencies are perfectly balanced at maximum 
entropy, incremental mutation or mixing from gene flow must necessarily reduce 
the entropy below its maximum value and therefore despite the general nature 
of the equations, S < logn. Therefore, at the boundaries we should define 

5 = logn Sm = —^f^ 

(63) 

9. Simulation Results 

Throughout much of the paper, it has been asserted that the methods based 
on information theoretic quantities were as effective as those from theoretical 
population genetics models. This section will test that assertion by running a 
1000 trial Monte Carlo of the evolution of the allele frequencies and entropy of 
a 250 member population over 1000 generations. The results of the simulation 
will then be compared with the predicted evolution of the population using the 
techniques derived in this paper and using the same fixed parameters. Note that 
all of these simulations only use the information theoretical parameters and not 
the Mercator approximation. 

In the following figures are comparisons for both the frequency of allele p 
and the entropy S where the solid green line is the simulation output, using 
the Python version of SimuPOP, and the dashed red line is the output from 
the information theoretic method. For the example of pure drift-mutation, the 
heterozygosity proportion will replace p on the y-axis. 
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Table 2: Simulation and entropy method results for three cases of population evolution with 
fixed parameters. Each simulation shows a population of A^e = 250 with a mutation rate 
fj, = 10~^ over 1000 generations. Each figure on the left side is the frequency of allele p over 
time with the exception of the first figure which is the heterozygosity ratio. Each figure on the 
right is the allele frequency entropy over time. The first pair represents drift-mutation with 
no other evolutionary forces acting with starting frequencies p = 0.6, q = 0.4. The second pair 
represents classic incomplete dominance with initial allele frequencies p = 0.2, tj = 0.8 and 
s = 0.75, /i = 0.49. The third pair is overdominance with starting values of p = 0.6, q = 0.4 
and s = -0.14, ft = 2. 
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Table 3: Simulation and entropy method results for two cases of underdominance with different 
starting frequencies for p. Each simulation shows a population of Ne = 250 with a mutation 
rate fi = 10~^ over 1000 generations. Each figure on the left side is the frequency of allele 
p over time. Each figure on the right is the allele frequency entropy over time. Both pairs 
represent underdominance with fitness variables s = 0.2, h = 2. The first pair have starting 
frequencies p = 0.2, g = 0.8 and the second pair have starting frequencies p = 0.8, q = 0.2. 
Note that in both cases, though the evolution of p is sensitive to its initial value, the value of 
S is approximately the same. 

10. Multiple Loci Models and Linkage Disequilibrium 

One of the key flexibilities of the information theoretical method is that it 
can be easily expanded to investigate systems with multiple alleles and multiple 
loci, even if the probabilities or outcomes are not analytically tractable. As 
you expand the analysis amongst multiple loci mutual information calculations 
become more important and take center stage versus almost all other consid- 
erations. A key example is the model of linkage disequilibrium. The standard 
measure of disequilibrium, D [sl, HI], between two loci with alleles, A and B is 

D ^ P{A, B) - P{A)P{B) (64) 
Other measures of linkage disequilibrium have already been devised using 
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entropy [40[, Kullback-Leibler divergence 41 1 and mutual information 42 1. 

Unlike the coefficients of relative fitness, however, D is the measure of un- 
derlying deviation from equilibrium, not a coefficient for a causal agent of that 
disequilibrium. Therefore, we can derive D from the mutual information from 
each other amongst multiple loci. For the case of two loci, it is easy to derive / 
from D. First, for a locus with alleles A, a and B, b, 



D = P{A,B) - P{A)P{B) 
D + P{A)P{B) =P{A,B) 
Da.b — Da.b — --Dam ~ —D, 



From this we can derive an expression for the mutual information 



(65) 
(66) 

(67) 



/ = (D + P(A)P(B))log 

+ {-D + P{A)P{h))\o% 

+ {-B P{a)P{B))\og 

+ (L> + P(a)P(&)) lo, 



D 



P{A)P{B) 

( ^ 

\P{A)P{h) 

-D 
P{ajP{B) 
D 



P{a)P{h) 



1 



(68) 



Using the log approximation, and multiplying out we can reduce the above 



to 



And finally 



P{A)P{B) P{A)P{b) P{a)P{B) P{a)P{b) 



(69) 



(70) 



P{A)P{a)P{B)P{b) 

Equation [70] is also the exact equation for the alternate measure of link- 
age disequilibrium known as . This shows that under linear approximation 
I and therefore is a roughly equivalent measure at the two loci level. This 
fact has previously been discussed in papers on LD and entropy [i^l and mu- 
tual information [41|. Mutual information does have an advantage, however, 
as you increase the number of loci, in that first, the mutual information is not 
a measure of linear dependence like the correlation coefficient. Second, it can 
consolidate into one metric the strength of the total relationship amongst all 
loci. Third and finally, it can be used to measure and compare the relative 
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disequilibrium at different numbers of loci using the multivariate mutual infor- 
mation or interaction information (which can be negative). More work needs 
to be performed, however, to make sure it is a robust and clear measure of LD 
with its own advantages versus other measures 



A full exposition of the entropy method applied to multiple loci is beyond 
the scope of this paper, however, we can quickly show that the increased gene 
diversity that genetic hitchhiking is often used to explain can also be explained 
using this method. 

For a model where we have two loci of allele pairs, A, a and B, b, the expected 
second order entropy of the system can be represented as 

SiA, B) = Sa + Sb~ (71) 

The change in entropy due to selection can be approximated with / and /' 
similarly where 

AS = I - I' = r^ - I' (72) 

Therefore, since is always positive, any level of linkage disequilibrium 
offsets the rate of reduction of genetic diversity across sites by slowing the 
change in the entropy decrease caused by selection. 



11. How Useful is Entropy? 

One aspect of the paper left unmcntioned is how we can go from values 
of entropy to the allele frequencies. For a two allele model, this is relatively 
simple given that you can do a seek on values of p whose entropy will match the 
calculated entropy given allowable tolerance. However, one will note that for an 
entropy function (see figure [1]) there are two possible p values for every value of 
entropy, one for p and one for 1—p. The solution to this is to have the software 
track the starting allele frequency for p and using the subsequent entropy change, 
you can determine the direction in the change of p by whether p > 0.5 or p < 0.5. 
Since the difference equation is equivalent to a first order differential equation, 
the entropy function has a monotonic increase or decrease in any given time 
step and cannot "skip" one solution past the maximum entropy to another of 
the same entropy in the same time step. Therefore, the closest value of p to the 
starting value which fulfills the criterion of the change in entropy is the solution. 

This highlights one of the key weaknesses of the method based of entropy and 
information theory: it can only calculate the structure of the distribution and 
does not differentiate between which alleles take what values in the frequency 
distribution. Therefore, any n allele model can give n! different possible matches 
between alleles and frequencies. This can only be distinguished by selecting one 
of the results as the most biologically feasible, often given the assumptions of 
the parameters for selection or non-random mating in the mutual information. 
Also, the overall entropy does not distinguish between alleles which are identical 
by descent, just the aggregate distribution. 
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Figure 1: Plot of first order entropy S vs. p. 



Finally, though the author again cautions against undue wild speculation 

on the connections between evolutionary processes and information theory, it 
is not inappropriate to note that such links may help us to understand more 
profound connections between biological processes and information theory. 

On the speculative side, a possible interesting result of this formalism would 
be an enhanced understanding of other types of evolution in systems that are 
not biological, but exhibit similar characteristics of discrete hereditary units 
which undergo forces that can be represented by the same information theoretic 
parameters. Be they artificial life simulations, malicious code evolution, or some 
other unimagined paradigm, they would be able to display similar evolution to 
that we observe in nature without having the same basic constitution, underlying 
biology, genetic coding or inheritance mechanisms, or even organic compounds. 
The universality of evolutionary processes could be deeper than we realize. A 
short excursion in this light is given in Appendix B. 

In conclusion, this paper has endeavored to show that the biological forces of 
evolution can be linked to an information theoretic representation that devises a 
comprehensive equation based on entropy that reproduces the commonly known 
features of evolutionary change in allele frequencies and genotypes. Whether 
this method will only reproduce what is already known in population genetics 
or produce new and unexpected insights that can be validated through genomic 
data will be an interesting question to be answered in future works. 
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Appendix A. The Diffusion Approximation 

The information theory techniques used to model genetic drift can be used 
to approximate diffusion. 

Starting from pure drift we have 

A^ = -^^ (A.1) 
with its continuous time version of 

From the definition of entropy being S = — f{x) log f{x)dx we can 
transform equation IA.2I to 

^^fix) log fix)dx = ^ I" fix) log fix)dx (A.3) 
^^{f{x)\ogfix)) ^ -^Jix)\ogf{x) (A.4) 

(l + log/(x))|[=-^/(x)log/(x) (A.5) 

Next we can simplify this equation by using the derivatives of entropy with 
respect to x: |f = -/(x) log /(x) and = -(1 + log/(x))^ to obtain 



d^Sdfjx) 1 df{x) dS 

'dx^ dt ~ ~2NI dx 'dx 



(A.6) 



Now we come close to the conclusion given the approximation S ~ 2x{l — x), 
we see that 

-4 (A.7) 



to give 



and 



dx^ 



df{x) _ 1 dj{x) d2x{l - x) 
dt ~ 8K~dx dx 



dfix) 1 df{x)dx{l-x) 



(Ai 



(A.9) 



dt dx dx 

This is not the final diffusion equation. To complete this derivation we 
borrow from the approximate solution to f{x,t) that Kimura derived in [s^ ] 

/(x,t)«6p(l-p)e"*/2^ (AlO) 

showing J{x,t) depends only on p, the probability at i = 0, and t therefore 
we can determine that if f{x) is not dependent on x. 
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and finally 



dfjx) ^ 1 d dfix)xil-x) 

dt m,dx dx ^ ' ' 



Appendix B. Channel Capacity of Genetic Information Inheritance 
in a Population 

This section is put into an appendix as an interesting mathematical excur- 
sion. Throughout this paper, I have striven to represent only the concepts most 
familiar to population genetics and which would prove most useful to theorists 
and practitioners whose main motivation is not just intellectual excursion but 
approaching real problems. 

This section will be more speculative but in short, the use of information 
theory to reproduce population genetics opens avenues not only to more easily 
represent previously mathematically difficult concepts, but to couple previously 
unrelated ideas using information theory as a bridge. Claude Shannon defined 
channel capacity in terms of entropy in his landmark paper The channel 
capacity is the maximum rate which a signal made up of symbols with a proba- 
bility for each symbol can be transmitted. For a population, the channel can be 
represented as the combined effects of mating and offspring fitness with the next 
generation's allele frequency being indicative of the previous generations with 
evolutionary effects. There have been previous efforts to calculate the chan- 
nel capacity of genetic information. Two investigations at the individual level 
genome, versus the population level in this paper, were done by Watkins 4^ 46j . 



His investigation focuses on the channel capacity for the transmission of infor- 
mation at the level of the individual genome by selection (natural or artificial) 
given the genome length, allele distribution, and population size. Near p = 0.5 
he finds this channel capacity is directly proportional to the genome length L. 
He also shows that sexual reproduction allows a higher channel capacity than 
asexual reproduction. A general idea of a channel capacity for evolution was 
also raised by science fiction writer Jonathan vos Post. 

In terms of a source entropy, H{X), a noiseless channel has a capacity C 

C = NH{X) (B.l) 

where N is the number of symbols per unit time. For a channel with noise, 
this capacity is equal to the entropy of the source minus the conditional entropy 
of the received signal H{Y) termed Hy{X) 

C = N{H{X)-Hy{X)) (B.2) 

For a simple type of noise that changes a character with a fixed random 
probability, for example a bit fiip from 1 to or a mutation of an allele p to q oi 
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vice versa as seen in SNPs, the channel is the entropy of the source minus the 
conditional entropy. Note for all equations the natural logarithm is used but the 
channel capacity in bits/second can be derived by dividing the result by log 2. 
For two alleles, p and q who have a probability /i of mutating into each other 
this channel capacity in a population of diploid organisms can be represented 
as 

C = 2Ne{-p logp-q\ogq + iilogi^+{l-ii) log(l - /z)) (B.3) 

or 

C = 2Ne{S-Sm) (B.4) 
Using approximations, this equation can also be represented as 

C = 2Ne{S-2ii) = 2Ne{h-2ii) (B.5) 

In other words, the channel capacity is represented by the entropy of the 
allele frequencies mimis twice the mutation rate. So increasing the genetic 
diversity seems to increase channel capacity while mutation, which ironically 
also increases the genetic diversity, reduces it. Again, 5 « /i is only valid when 
/ = /' = 0. 

Now we will calculate the channel capacity in special cases of population 
balance. For Hardy- Weinberg equilibrium obviously C = 2Nf.S perpetually. In 
addition, with mutation the channel capacity is for S ~ h = 2/i. This should 
be expected for a signal-to-noise level of 0. Since in a completely homozygous 
population, mutation would introduce heterozygosity, an approximate signal- 
to-noise ratio can be hypothesized as 

SNR = = (B.6) 

where the pq/jJ- approximation is only valid when there is no selection or 
non-random mating. For an S several orders of magnitude larger than n we can 
simply state 

SNR = — K?^ (B.7) 
2/x /i 

S/ II may also be an acceptable approximation at these orders of magnitude. 
Now for more detailed situations where entropy is known under balance 
conditions. For drift-mutation balance we have 

C = 2Ne{h-2fi) = 2Ne{'iNeii-2ii) (B.8) 

C = 47VeAt(27Ve - 1) « 87V|m (B.9) 

With drift and mutation balancing, we see several interesting effects. First, 
the channel capacity increases with the square of the effective population size. 



27 



much faster than normal. More surprising though is the channel capacity be- 
comes directly proportional to the mutation rate so in this case, increasing the 
rate of mutation actually increases the channel capacity. This is to be expected 
since in drift-mutation balance, mutation is the only force maintaining variation 
which drift would otherwise push to over time. Migration-drift balance, using 



2Nem , 

) 

1 + 2Nem 



(B.IO) 



Nemh* ~ + 2NeM) 
1 + 2Nem 



(B.ll) 



For the case of selection- mutation balance, one can show given /' = St-i 
^tipll) — 2/i the channel capacity is proportional to the conditional entropy 



C = 2N,Sip\q) 



(B.12) 



This is an especially interesting result. Under all conditions, S{p\q) < S 
which demonstrates that natural selection acts on the channel capacity in an 
equivalent manner to a filter by which stronger selection (lower conditional en- 
tropy) reduces the channel capacity acting as a filter on the amount of variation 
which can propagate between generations in a population discarding S— S{p\q) 
variation. Stronger effects of natural selection when balanced with mutation, 
cancel out the mutation effects so the channel capacity depends only on the 
conditional entropy and the induced effects of selection (or nonrandom mating). 
In fact, in all of these examples, the channel capacity divided by the number 
of alleles {2Ne) is the maximum entropy the population allele frequencies can 
maintain between generations without change. By definition, any source en- 
tropy rate above the channel capacity can not be transmitted without error and 
in this case, the allele frequencies would be forced to change back towards the 
entropy representing the channel capacity. For drift-mutation balance this is 
S = ANefJ, and for selection-mutation balance this is 5* = S{p\q). 

Finally, we have the balance for combined effects of all forces which gives 



C = 2Ne 



2n + Stip\q) + mS* 
1 + 2^ 



2fi 



(B.13) 



reducing to equation IB. 121 for large populations with no gene flow or drift. 



Appendix C. Maximum Entropy and Hardy- Weinberg Equilibrium 

Since many researchers may not have access to or be able to translate the 
paper by Wang et. al. 26] that derives HWE from the maximum joint entropy 
of an allele distribution, a short derivation is included below. 

The quantity to be maximized is 5*2 
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52 = -^^P(i,j)l0gP(i,i) 

i=l j=l 

subject to the constraints 



(C.l) 



1 " 

-Y.{p{i,j)+p{j,i)) = p{i) 



(C.2) 



(C.3) 



Using the method of Lagrange multiphers, we can create a Lagrange func- 



tion, 
G{p,q) 



■^J2P{i,j) log P{i, j) + (In Ao + 1) 

i=i j=i 



Taking = we get 



1 " 

-5^(P(i,j)+P(j,i))-P(i) 



lnP(i,i) - InAo - -(In Ai + Xj) = 



(C.4) 



(C.5) 



which solves to 



P{t,j) = Aov/A^ 



(C.6) 



Using the indexes we can easily see that P(l, 1) = AqAi, P(1, 2) = P(2, 1) = 
Ao\/AlA^, and P(2, 2) = A0A2. 

Our constraint equations can thus be restated as follows: 

Ao(Ai+A2 + 2VAiA2) = 1 
Ao(Ai + TaIa^) = -P(l) 



Ao(A2 + ^/xJ^) = P(2) 



(C.7) 



Doing the math on P(l) and P(2) you can clearly see that P(l, 1) = AqAi = 
P(l)2, P(2, 2) = A0A2 = P(2)2, and P(l, 2) = P(2, 1) = Aox/AlAi = P(1)P(2) 
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